-
Notifications
You must be signed in to change notification settings - Fork 12
[DOC] Section 1 of user guide/definition of concepts #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #408 +/- ##
=======================================
Coverage 98.08% 98.08%
=======================================
Files 22 22
Lines 1148 1148
=======================================
Hits 1126 1126
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bthirion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be useful to have in this section a typology of all VI methods.
man-shu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Just wondering whether we should introduce the Total Sobol Index in the "Types of VI methods" section or some other place. The original issue #306 mentions it...
| There are two main types of VI methods implemented in HiDimStat: | ||
|
|
||
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| that are related to the output, even if it is caused by spurious correlation. They |
docs/src/concepts.rst
Outdated
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | ||
| Example of such methods is LOCI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to provide a reference for LOCI, or at least expand the abbreviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would also suggest the reference but I think they are not yet available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LOCI, I find this reference: Ewald, Fiona Katharina, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar König. "A guide to feature importance methods for scientific inference." In World Conference on Explainable Artificial Intelligence, pp. 440-464. Cham: Springer Nature Switzerland, 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant it was the reference to the implemented class, not a bibliography reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biblio ref should be good enough for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference for the implementation should be only in the docstring of the class. In this case, we can keep a more general bibliography.
docs/src/concepts.rst
Outdated
| 1. Marginal methods: these methods provide importance to all the features | ||
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | ||
| Example of such methods is LOCI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Example of such methods is LOCI. | |
| An example of such a method is LOCI. |
| i.e., they contribute unique knowledge. They are related with Conditional | ||
| Independence Testing, which consist in testing if | ||
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | ||
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| i.e., they contribute unique knowledge. They are related with Conditional | |
| Independence Testing, which consist in testing if | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. | |
| i.e., they contribute unique knowledge. They are related to Conditional | |
| Independence Testing, which consists of testing whether | |
| :math:`X^j\perp\!\!\!\!\perp Y\mid X^{-j}`. Examples of such methods are | |
| :class:`hidimstat.LOCO` and :class:`hidimstat.CFI`. |
docs/src/concepts.rst
Outdated
| soon). | ||
|
|
||
| Variable Selection | ||
| ------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ------------------------------- | |
| ------------------ |
docs/src/concepts.rst
Outdated
|
|
||
|
|
||
| High-dimension and correlation | ||
| ----------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ----------------------------------- | |
| ------------------------------ |
| that are related to the output, even if it is caused by spurius correlation. They | ||
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| that are related to the output, even if it is caused by spurius correlation. They | |
| are related with testing if :math:`X^j\perp\!\!\!\!\perp Y`. | |
| that are related to the output, even if it is caused by spurius correlation. They | |
| consist of testing whether :math:`X^j\perp\!\!\!\!\perp Y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that sounds better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because they do not directly test whether X is independent of Y because they are variable importance measures, not just for selection. That is why I would say that implicitly they are related to this testing, but they do not consist on this testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense!
| statistical control to the discoveries made. Simply selecting the most important | ||
| features without such control is not valid. Different forms of guarantees can | ||
| be employed, such as controlling the type-I error or the False Discovery Rate. | ||
| This step is directly related to the task of Variable Selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be very wrong, but isn't this section somewhat redundant to the Variable Selection section? Could it be incorporated with the Variable Selection section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I am not sure how. Indeed it is important to make explicit that the power of the library is to provide statistical guarantees too.
docs/src/concepts.rst
Outdated
| It allow us to rank the variables from more to less important. | ||
|
|
||
| Here, ``VI`` can be a variable importance method implemented in HiDimStat, | ||
| such as :class:`hidimstat.LOCO` (other methods will support the same API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can use the full name of the model before to introduce the acronym of it, it will be better.
Relates to #306. With @AngelReyero.
For section 1 of the user guide, which contains the definition of all basic concepts.